Polynomial Tree Substitution Grammars: an efficient framework for Data-Oriented Parsing
نویسندگان
چکیده
Finding the most probable parse tree in the framework of Data-Oriented Parsing (DOP), a Stochastic Tree Substitution Parsing scheme developed by R. Bod (Bod 92), has proven to be NP-hard in the most general case (Sima’an 96a). However, introducing some a priori restrictions on the choice of the elementary trees (i.e. grammar rules) leads to interesting DOP instances with polynomial time-complexity. The purpose of this paper is to present such an instance, based on the minimal-maximal selection principle, and to evaluate its performances on two different corpora.
منابع مشابه
Inducing Compact but Accurate Tree-Substitution Grammars
Tree substitution grammars (TSGs) are a compelling alternative to context-free grammars for modelling syntax. However, many popular techniques for estimating weighted TSGs (under the moniker of Data Oriented Parsing) suffer from the problems of inconsistency and over-fitting. We present a theoretically principled model which solves these problems using a Bayesian non-parametric formulation. Our...
متن کاملEecient Disambiguation by Means of Stochastic Tree Substitution Grammars
In Stochastic Tree Substitution Grammars (STSGs), one parse(tree) of an input sentence can be generated by exponentially many derivations ; the probability of a parse is deened as the sum of the probabilities of its derivations. As a result, some methods of Stochastic Context-Free Grammars (SCFGs), e.g. the Viterbi algorithm for nding the most probable parse (MPP) of an input sentence, are not ...
متن کاملParsing with the Shortest Derivation
Common wisdom has it that the bias of stochastic grammars in favor of shorter derivations of a sentence is harmful and should be redressed. We show that the common wisdom is wrong for stochastic grammars that use elementary trees instead of context-free rules, such as Stochastic Tree-Substitution Grammars used by Data-Oriented Parsing models. For such grammars a non-probabilistic metric based o...
متن کاملParsing preferences with Lexicalized Trey Adjoining Grammars exploiting the derivation tree
Since Kimball (73) parsing preference principles such as "Right association" (RA) and "Minimal attachment" (MA) are often formulated with respect to constituent trees. We present 3 preference principles based on "derivation trees" within the framework of LTAGs. We argue they remedy some shortcomings of the former approaches and account for widely accepted heuristics (e.g. argument/modifier, idi...
متن کاملTabulation of Automata for Tree-Adjoining Languages
We propose a modular design of tabular parsing algorithms for treeadjoining languages. The modularity is made possible by a separation of the parsing strategy from the mechanism of tabulation. The parsing strategy is expressed in terms of the construction of a nondeterministic automaton from a grammar; three distinct types of automaton will be discussed. The mechanism of tabulation leads to the...
متن کامل